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ADAPTIVE ESTIMATION OF AND ORACLE INEQUALITIES FOR 
PROBABILITY DENSITIES AND CHARACTERISTIC 
FUNCTIONS 1 

By Sam Efromovich 

University of Texas at Dallas 

The theory of adaptive estimation and oracle inequalities for the 
case of Gaussian-shift-finite-interval experiments has made signifi- 
cant progress in recent years. In particular, sharp-minimax adaptive 
estimators and exact exponential-type oracle inequalities have been 
suggested for a vast set of functions including analytic and Sobolev 
with any positive index as well as for Efromovich-Pinsker and Stein 
blockwise-shrinkage estimators. Is it possible to obtain similar results 
for a more interesting applied problem of density estimation and/or 
the dual problem of characteristic function estimation? The answer is 
"yes." In particular, the obtained results include exact exponential- 
type oracle inequalities which allow to consider, for the first time in 
the literature, a simultaneous sharp-minimax estimation of Sobolev 
densities with any positive index (not necessarily larger than 1/2), in- 
finitely differentiable densities (including analytic, entire and stable) , 
as well as of not absolutely integrable characteristic functions. The 
same adaptive estimator is also rate minimax over a familiar class 
of distributions with bounded spectrum where the density and the 
characteristic function can be estimated with the parametric rate. 

1. Introduction. Univariate probability density estimation is one of the 
fundamental topics in applied and mathematical statistics, and it is not sur- 
prising that first theoretical results about rate-optimal estimation of non- 
parametric functions were obtained for this statistical model; the interested 
reader is referred to a discussion in books [9, 14, 45, 49, 51]. An important 
step in the theory of a nonparametric density estimation was made by Nuss- 
baum [42] who established that, for the case of a finite-support density and 
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a bounded loss function, there existed an asymptotic equivalence between 
the density model and a Gaussian-shift-finite-interval experiment; the in- 
terested reader can find more about the equivalence and a review of latest 
results in [5]. Because a Gaussian-shift model is simpler to work with, over 
the last decade the nonparametric research has been primarily devoted to a 
Gaussian-shift experiment and a vast set of pioneering results, specifically in 
the area of adaptive estimation and oracle inequalities, has been obtained; 
see a discussion in [6, 8, 15, 21, 24, 37, 39, 44, 54]. 

Due to Nussbaum's equivalence paradigm, there is a belief in the nonpara- 
metric literature that known adaptive estimators and oracle inequalities for 
a Gaussian-shift -finite-interval experiment may guide a creation of similar 
results for density estimation. This article shows that this belief is valid, and 
it develops a theory of adaptive estimation of and oracle inequalities for the 
probability density which matches recently obtained results for Gaussian- 
shift models. Moreover, it is possible to consider densities with both finite 
and infinite supports while the equivalence theory exists only for the density 
with a finite support, and the article also explores estimation of character- 
istic functions. 

There are many applications of the obtained results. In particular, 
exponential-type oracle inequalities allow the statistician to consider a vast 
portfolio of blocks and thresholds including the smaller blocks suggested in 
the Gaussian-shift literature. The article also solves a long (more than two 
decades) standing problem of adaptive-sharp-minimax estimation of densi- 
ties with a positive Sobolev index. Let us recall that, under mean integrated 
squared error (MISE) criteria, so far only densities with Sobolev index larger 
than 1/2 have been studied in the sharp-minimax literature; see a discus- 
sion in [3, 16, 18, 19, 23, 29, 32, 46, 47, 48, 50]. Note that, according to [17], 
no such restriction exists for a Gaussian-shift experiment. Interestingly, the 
asymptotic nonequivalence between the two models is valid whenever the in- 
dex is at most 1/2, and for years this fact has served as a pleasing justification 
of the absence of the theory of a sharp adaptive estimation of those rougher 
densities; see a discussion in [4, 19]. This article shows that, fortunately, 
the nonequivalence does not affect the studied adaptive density estimation 
under the MISE criteria. Another important application is the possibility to 
consider distributions with not absolutely integrable (but square-integrable) 
characteristic functions which never before have been studied in the litera- 
ture, and then suggest oracle inequalities for and sharp-minimax estimators 
of such characteristic functions. Further, for the first time in the literature 
a data-driven procedure for estimation of densities supported on a real line 
is suggested which is simultaneously sharp minimax over Sobolev (of any 
order) and infinitely differentiable densities (including entire densities like 
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normal and their mixtures or analytic densities like Cauchy and their mix- 
tures). Moreover, the suggested estimator implies the parametric rate of con- 
vergence for classical distributions with bounded spectrum (whose Fourier 
transform has a finite support). 

The content of the article is as follows. To make the paper shorter, re- 
sults are presented for densities supported on a real line (technical report 
[22] contains results for the finite support). Section 2 presents a short re- 
view of relevant results for the case of a Gaussian-shift experiment; these 
are the results to match. Section 3 presents the EP estimators for density 
and characteristic functions. Section 4 presents new oracle inequalities. Sec- 
tion 5 explores minimaxity of the estimator. The Stein density estimator, 
based on the famous Stein shrinkage procedure, is explored in Section 6; 
it is shown that, under a mild assumption, Stein and EP estimators have 
similar asymptotic properties. Discussion of results is deferred until Section 
7. Section 8 contains proofs; some of its technically involved parts, including 
new moment and exponential inequalities for Sobolev statistics, are placed 
in the Appendix. 

In what follows C"s denote generic positive constants and o s (l)'s denote 
generic finite sequences which vanish as s — ► oo. 

2. Review of relevant results for a Gaussian-shift experiment. Consider 
a Gaussian-shift-finite-interval experiment dY(t) = f(t) + n~ 1 / 2 dB(t), < 
t <1, where Y(t) is an observed signal, / is an unknown square- integrable 
signal/shift of interest, B(t) is a standard Brownian motion and n is a 
positive integer which later will denote the sample size in a density model. 
Note that another customarily used name for the problem is the filtering a 
signal from a white Gaussian noise. Traditionally the model is rewritten in 
Fourier, wavelet or any other orthogonal basis domain; then an equivalent 
sequence model is considered: 

(2.1) Vj = 0j + rr 1 ' 2 Z j , J = 1,2,..., 

where are independent standard Gaussian random variables, 6 = {61,62, ■ ■ ■} 
is an unknown vector-parameter of interest, and Jq 1 f 2 (t)dt = J2j^=i@j =: 
||# || 2 < 00. The interested reader can find a comprehensive discussion of 
the sequence model (2.1) in [36]. The Efromovich-Pinsker (EP) blockwise- 
shrinkage estimator is defined as 

K 

(2.2) dj-YsfikVjlUtBk), 

k=l 

where the shrinkage (smoothing) coefficients/ weights are 
(2-3) Afe := ^'i^' 1 1{\\V\\1 > (1 + t k )L k n~ l ), 

WvWi 
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/(•) is the indicator, {1 = b\ < 62 < • ■ •} is a given sequence of positive inte- 
gers and then B k := {b k , b k + 1, . . . , b k +i — 1} and L k := b k+ i — b k are corre- 
sponding blocks and their lengths, t k > are thresholds (some authors refer 
to as a penalty), ||y||| := J2jeB k Vj an d this statistic is often referred to 

as a Sobolev statistic, and an integer K = K(n) is a cutoff defined from the 
relation J2^=i^k < n 1_1 / ln ^ n+1 ^ < J2 k =i ^k (see a comment on this choice 
in Section 7). The risk E\\fi k y — 9\\i is minimized by a shrinkage coefficient 
(oracle) 

llflll? 

(2.4) ^ : = 



\l + L k n- 



which depends on a quantity := J2j£B k @j (so-called Sobolev functional) 
unavailable to the statistician. Then 9* := (0f,0|,...) with 9* := fikUj, j € 
Bf~ can serve as a (linear) blockwise-shrinkage oracle which, in its turn, is 
a blockwise version of the famous Wiener filter. The oracle has excellent 
minimax properties; in particular under a mild assumption on blocks and 
thresholds this oracle is simultaneously sharp minimax over Sobolev and 
analytic function classes; the interested reader can find a discussion in [17, 
19, 36, 52, 53]. Then it is natural to use the mean squared error [or mean 
integrated squared error (MISE) for the dual filtering problem] of this oracle 
as a benchmark for the risk of any blockwise-shrinkage estimator. A simple 
calculation yields that the oracle's risk is 

K 

E\\9* - 9f = E E " e i? + E 11*11* 

k=ljeB k k>K 

(2.5) 

K 

= n~ 1 Y,L k ^k+ E l^llfe- 

k=l k>K 

Now we can formulate a known technical result which will imply oracle 
inequalities of interest. To do this, let us recall the Stirling formula for the 
Gamma function T(L/2), L = 1,2, . . . (see [1]), 

* r(L/2) „ 

L " (27T) 1 /2 e -L/2( L /2)(L/2)-l/2 < S L 

^ s^* — > 1 as L — > 00. 

Lemma 2.1 ([21]). Consider a particular block B k and assume that < 
tk < 1. Then there exists an absolute constant Cq such that for any q k 6 
[1/4, min(l, l/4tfc)) and any u k > the EP estimator satisfies 

(2.7) E\\9 - 9\\l < E\\9* - 9\\ 2 k + n^L^D*, + D* k *], 
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I < 2L k t k n- 1 )} 



and 




(1 + u^)L^[Ll/ 2 /s* Lk + 8((L k t k )-^ + (L.tl)- 1 ' 2 )} 



(2.9) 



x ex.p{-L k [q k t k - ln(l + q k t k )]/2} 



Remark 2.1. The condition t k — > 0, /c — ► oo is necessary for the estimate 
to sharply mimic the oracle's risk; this explains why only the case t k < 1 is 
considered in Lemma 2.1; see [24]. Further, it is easy to recognize that in (2.7) 
the term Dt* plays more important role than the Df; indeed Df* defines the 
remainder in the oracle inequality while D*. defines the multiplicative factor. 
As a result, for mimicking the oracle's risk the term D*j* should vanish with 
an appropriate rate while the term D*. may vanish with any rate as k — > oo. 
Further, the following lower bound of [21]: 



allows one to appreciate the accuracy of the exponential factor in Dt* [com- 
pare exponential factors in (2.9) and (2.10)]. The exponential factor in Df* 
is critical because it allows one to use smaller blocks; see a discussion in 



Now we can formulate several types of oracle inequalities suggested in 
the literature and based on Lemma 2.1. These are the results to match for 
density estimation. 

Theorem 2.1 ([21]). Suppose that the assumption of Lemma 2.1 holds 
for alike {1,2,..., K}. Then: 

(a) Risk of the EP estimate is bounded from above by the following oracle 
inequality: 



(2.10) 



E\\9-e\\l> 




xexp{-L fc [t fc -ln(l + t fc )]/2} 



% = 



[6, 7, 8, 12, 19]. 



K 



(2.11) 



E\\9 - Of < E\\§* -ef + n- 1 ^ L k [^ k D* k + D* k *}. 



k=l 
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(b) Denote A m := max m < k < K D* k , S m := J2k= m L kD* k * and T c 
{k:n k Dl+D* k * > 1}. Then 



E\\6- 6\\ 2 < min 

\<m<K 



(2.12) 



m— 1 



(1 + A m )E\\6* -e\\ 2 + n- 1 (s m + J2 L k 



k=l 



+ W 1 £ L k \jM k Dt + D?], 

where by convention Ylj=i = 0- 

(c) Set T := {k:D* k + D%* > 1} with D* k and D* k * defined as in (2.8) 
and (2.9) only with fi k and indicator functions replaced by 1, and then, 
following part (b), define corresponding A m , S m and Tq. Also, let us modify 
the EP estimator 6j by considering 6j := yj for j £ B k , k 6 To and 6j := 6j 
otherwise. Then 



E\\6-6\\ 2 < min 



m— 1 



1 + A m )E\\9* - 9\\ 2 + n" 1 5 m + J2 L k 

V fe=i / 



l<m<if 

(2.13) 

+ n ^ L k . 

ket 

3. EP density and characteristic function estimators. Suppose that X%, 
. . . , X n , n > 3 are i.i.d. realizations according to an unknown square- integrable 
on a real line density f(x), x G (—00,00); it is not assumed that the density 
is positive on a real line. Let us recall that 



(3.1) /(x) = (2^)- 1 / h(u)e- lux du, x£ (-00,00), 

J — 00 

where 

roo 

(3.2) :=£{e iuX }= / e iux f{x)dx, uG(-oo,oo) 



is the characteristic function corresponding to /. If the characteristic func- 
tion is not absolutely integrable, then the inverse formula (3.1) is understood 
in the sense of Plancherel's theorem. The problem is to estimate the density 
and the characteristic function under the MISE criterion. 

Recall that the characteristic function satisfies h(—u) = h(u), the complex 
conjugate of h(u). Thus we can consider only h(u), u G [0, 00) and then 
f(x) = 7T _1 Jq 00 Re{h(u)e~ lux } du. Now we are following the construction of 
the EP estimator for the Gaussian-shift case. We divide a half-line [0, 00) 
into a sequence of nonoverlapping blocks (intervals) B k := [b' k ,b' k+1 ), = 
b'i <b' 2 < ■ ■ ■ with the corresponding lengths L k := b' k+1 — b' k = f B du. Then 
the following abuse of the previous notation will be handy. Set 

(3.3) \\y\\ 2 := f \h(u)\ 2 du, 
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n 

(3.4) h(u) := n~ l J2 exp{iuXi} 

1=1 

is the empirical characteristic function estimator. Then we define an EP 
density estimator as 

roc 

(3.5) fix) --it- 1 Re{h{u)e- iux } du, x€ (-00,00). 

J 

Here 

K 

(3.6) h(u):=Y,^kh(u)I(u£B k ), u>0 

k=l 

is the EP characteristic function estimator, and \x k is defined in (2.3). To 
make the similarity complete, we denote 

(3.7) ||0||fc:=/ \h(u)\ 2 du, \\6\\ 2 k := / \h{u)\ 2 du 

JB k JB k 

and 

(3.8) \\§-9\\ 2 k := f \h(u) - h(u)\ 2 du. 

JB k 

To shed light on the above-introduced notation, note that according to 
Plancherel's identity the MISE of EP density estimator (3.5) can be written 

as 

(/(s) - f(x)) 2 dx = 7r~ 1 E I \h{u) - h{u)\ 2 du 

-00 

(3.9) 







k=l 

Further, using (2.4) we define the corresponding oracles f*(x) and h*(u) as 

K r 

(3.10) f*(x) : = 7T _1 5^m* / Re{h(u)e- lux } du, xG(-oo,oo), 

k=i 

K 

(3.11) h*(u) :=J2^kh(u)I(ueB k ), u>0. 

k=l 

Also we set := J Bk \h*(u)\ 2 du and - 0\\l := J Bk \h*(u) - h{u)\ 2 du. 

Finally, if an EP density estimate (or the oracle) takes on negative values, 
then its nonnegative projection may be considered; see Section 3.1 in [19]. 
Further, if a monotonicity assumption is known, then methods of Efromovich 
[20] can be used. 
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Remark 3.1. According to (3.9), for both the density and characteristic 
function settings, it suffices to present bounds on their MISEs via E\\0 — 
9\\ 2 := EYskLi \\0 ~ 0\\k- Tnis approach will be used in Section 4. Also, to 
avoid any confusion with the Gaussian-shift case, we shall refer to the above- 
introduced 9 as the EP density-model estimator. 

4. Exponential-type oracle inequality for EP estimator. In what 
follows ci denotes the universal positive constant C 2 of de la Peha and 
Montgomery-Smith [13], c 2 denotes the universal positive constant K in 
the Bernstein-type inequality (3.18) of Gine, Latala and Zinn [27], d := 
d(f) ■= fZo\Hu)\ 2 du = 27tJZ f 2 (x)dx, d* := d*(f,L) := mm z>0 (z + 
Lz~ l Jr x . j( x )> z } f 2 (x)dx), and for a feth block 

Ai := \i(L k ,tk,d,d*) 

:= (d4c2)-\l - min(l/2, ^ /4 )) 2 (1 - (L k + l)' 1 / 2 ) 2 

1 



(4.1) x min 



[1 + An-H k (2d- 1 / 2 + Zn^d-Hk)] ' 
c\d 



t k [8d*(f,L k ) + 3(n~ 1 L k t k y/ 2 Y 

riK^ 4 L~ 5/2 ] 1/3 cfW^N 
[(2d)V2 + 20n-i* fc ]i/3' 2t 3/2 Lfe J> 

(4.2) X 2 :=X 2 (L k ,t k ,d) 

_ nmin(l/4,4 /2 ) (1 - min(l/2,# 4 )) 2 (l - (L k + l)' 1 / 2 ) 2 
L k t k c\ 3c^ + 2dL^\ 1 + 8n~ 1 (2d 1 / 2 + t k ) 

min(l/4,tf) (l-(L fc + l)-V2)2 



(4.3) A 3 :=A 3 (L fc ,t fc ,a!): 



Theorem 4.1. Suppose that X\, . . . ,X n , n>3 are i.i.d according to 
a square-integrable density f £ L 2 {— 00,00). Consider a particular block B k 
with length L k > and a particular threshold level t k > 0. T/ien /or any 
i^/c E (0, 1) £ae following oracle inequality holds for the EP density-model 
estimator defined in Section 3: 

(4.4) E\\9 - 9\\ 2 < E\\9* - 6\\ 2 + n^L^D', + D' k % 
where 

(4.5) E\\9* - 9\\l = n-%/i fe [l - fi k L^\\9\\ 2 ], 

D , k :=u k (l-^ 1 \\9\\ 2 ) + (1 + ^ 1 ) 
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(4.6) 



x [L-VfyM 1 / 2 + 3d(l + L~ 1/2 )(l + 1," 1 )) 



(4.8) 



(4.7) 



G{L k ,t k 



d,d*) 




+ min( Mfe (l + t k ),2t h )I(\\0\\i < 2L k t k n~ 1 )} 
(l + v^)[L^(d + 3dV\)]V 2 

x G(L k ,t k ,d,d*)I(\\9\\ 2 k < L]l\ k n-\ 
[cic 2 exp{-^L fc Ai} 

+ 2ci exp{-^L fe A 2 } + exp{-tlL fc A 3 }] 1/2 . 



Theorem 4.1 implies a result which matches Theorem 2.1. 

Corollary 4.1. Let the assumption of Theorem 4.1 hold for all k € 
{1,2, ... ,K}. Then assertions (a)-(c) of Theorem 2.1 are valid for the EP 
density-model estimator with D k and D k * replaced by D' k and D k , respec- 
tively. 

These results yield two important conclusions: (i) It is possible to suggest 
identical blockwise-shrinkage estimators for the Gaussian-shift, density and 
characteristic function estimation models, (ii) The MISEs of those data- 
driven estimators satisfy similar exponential- type oracle inequalities. 

Remark 4.1. While there is a difference between the density- model or- 
acle's error E\\6* — 6\\ k , presented in (4.5), and the corresponding Gaussian- 
shift oracle's error E\\6* — 9\\ k = n~ 1 L k [i kl this difference bears no con- 
sequences for nonparametric cases where the MISE vanishes more slowly 
than n _1 . The latter is based on a plain observation that for any square- 
integrable density the term /i^L^ 1 ^!!! vanishes as k — > oo; further, note 
that ^L^II^H! < L k ~ 1 7rd and if the statistician uses blocks satisfying L k > 
L(n) — > oo, n — ► oo, then this term vanishes uniformly over the blocks as 
n — > oo. 

Remark 4.2. In a majority of asymptotic applications of Theorem 4.1 
the main exponential term in (4.8) is the one containing Ai. Further, let us 
note that d*(f,L) < 2min(sup ;c f(x), (dL) 1 / 2 ). This inequality allows one to 
analyze Ai for bounded and unbounded densities. 

5. Sharp minimaxity. In this section the above-established oracle in- 
equality is used to prove a simultaneous sharp minimaxity of the EP density 
estimate for Sobolev and infinitely differentiable distribution classes as well 
as its rate minimaxity for distribution classes with bounded spectrum where 
the MISE converges with the parametric rate n _1 . The interested reader can 
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find a thorough discussion of these classes in [3, 19, 30, 31, 32, 33, 34, 35, 36, 
38, 40, 51, 55]. Below these distribution classes are defined via corresponding 
characteristic functions which are assumed to be square integrable, and let 
us recall that if the characteristic function h belongs to Z/2(— oo,oo), then 
the corresponding cumulative distribution function is absolutely continuous 
and its density / belongs to L a (— oo,oo) for any 1 < a < 2; see Theorem 
11.6.1 in [38]. 

We consider those distribution classes in turn. Let a and Q be positive 
real numbers; then a Sobolev class (of order a) is defined as 

S(a,Q) := \f{x) ivr- 1 r°(l + \u\ 2a )\h{u)\ 2 du < Q, 
(5.1) 1 J ° 



h(u) = / f{x)e tux dx 

Theorem 5.1 (Sobolev class). Let a sample X±, X2, ■ ■ ■ ,X n of n i.i.d. 
observations with a square-integrable density f 6 L2(— 00,00) be given. Sup- 
pose that blocks and thresholds of EP estimator f, defined in (3.5), satisfy 

Lfc_l_i/Lfc — » 1 and sup-DjL — » 

(5-2) 1 * 

as k -> 00, supj 2^ L k D k < 5 n , 

k=l 

where the supremums are taken over f 6 S(a, Q) and 6 n = n° n ^ . Then 
(5.3) sup {e f°° (f(x)-f(x) fdx\(l + o n (l)) 

f&S(a,Q) I J -00 J 

= inf sup E / (f(x) — f(x)) dx 

f feS(a,Q) Joe 

(5.4) 

= P(a,Q)n~ 2a ^ 2a+1 Hl + o n (l)), 

where in (5.4) the infimum is taken over all possible density estimates f 
based on the sample and parameters a andQ, and P(a,Q) := (2q + l)[-/r(2a + 
l)(a + l) a - 1 ]-W(2a+i)gi/(2Q+i) is the pi ns k e r constant. 

Let us recall that only Sobolev classes of order a > 1/2 have been consid- 
ered in the literature so far; see a discussion in [11, 19, 29, 46, 50]. 

Now let us consider another popular (specifically in the literature de- 
voted to characteristic functions and stable distributions) class of infinitely 
differentiable distributions 

■A(r,7,Q) 

(5.5) 



roo 

f-.TT' 1 \e luT h{u)\ 2 du < Q, h(u) = I f(x)e lux dx 
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Here 7 and Q are positive real numbers and r € (0, 2]. A thorough discussion 
of this class can be found in the classical books [38, 40, 55] as well as in [2, 19, 
33, 34, 35, 39]. This class includes analytic, stable and entire distributions 
with more familiar particular examples being Cauchy mixtures (where r = 1) 
and Normal mixtures (where r = 2). 

Theorem 5.2 (Infinitely differentiable class). Let the assumption of 
Theorem 5.1 hold with the supremums in (5.2) taken over f E A(r,^f,Q) 
and 5 n = o n (l)[ln(n)] 1 / 2 . Then 

(5.6) sup (E (f(x)-f(x)) 2 dx\(l + o n (l)) 

f€A(r,j,Q) i J-00 ) 

f°° - 2 
= inf sup E / (f{x) — f{x)) dx 

f f&A{r,^,Q) J-00 

(5.7) 

= vr- 1 n- 1 [ln(n)/(2 7 )] 1 /-(l + 0n (l)), 

where the infimum in ( 5. 7) is taken over all estimates f based on the sample 
and parameters (r, 7, Q). 

Finally, let s denote a positive real number, and consider a familiar class 
of distributions with bounded spectrum 

(5.8) £(s) = j/:%) = 0, |u|>s, h{u) = J°° f(x)e iux dx\. 

According to Theorem 11.12.1 in [38], a distribution with bounded spectrum, 
which is not from a uniform family, is an entire order of 1 and of exponential 
type. Then, as it is emphasized by Ibragimov and Khasminskii [33, 34], we 
are dealing with essentially infinite-dimensional class. Nonetheless, they were 
the first to recognize that the sharp-minimax MISE is 7r _1 sn _1 (l + o n (l)), 
that is, the MISE's convergence is parametric] The parametric convergence 
is too fast for the essentially nonparametric adaptive EP estimator; however, 
the following result still holds. 

Theorem 5.3 (Bounded spectrum class). Let the assumption of The- 
orem 5.1 hold with the supremums in (5.2) taken over f £ B(s) and 5 n = 
o s (l)s. Then 

(5.9) sup \e I™ (f(x)-f(x)) 2 dx\(l + o n (l) + o s (l)) 
feB(s) I J-00 ) 

/oo 
(f{x) - f{x)) 2 dx = 7r _1 sn _1 (l + o n (l)), 
-00 

where the infimum in (5.10) is taken over all estimates f based on the sample 
and parameter s. 
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Remark 5.1. Using Remark 4.2, it is plain to verify that a majority of 
known portfolios of blocks and thresholds, suggested in the sharp-minimax 
Gaussian-shift literature, simultaneously satisfy conditions of Theorems 5.1- 
5.3. Just to point to a specific and simple example with relatively "small" 
logarithmic blocks, consider {{L k = ln 3 (k + 3), t k = l/ln(ln(A; + 3))),k = 
1,2,...}. This portfolio simultaneously satisfies assumptions of Theorems 
5.1-5.3. 

We may conclude that the adaptive EP density (or characteristic func- 
tion) estimator is simultaneously sharp minimax over Sobolev and infinitely 
differentiable classes of distributions. On top of this nice property, the adap- 
tive estimator is also rate minimax over distributions with bounded spec- 
trum, and its MISE attains the parametric-minimax MISE when the spec- 
trum band increases. To the best of the author's knowledge, this is the first 
known example of such simultaneous adaptive density estimation, as well 
as the first example of a simultaneous adaptive sharp-minimax estimation 
for classes of distributions which include both absolutely integrable and not 
absolutely integrable characteristic functions. 

Remark 5.2. Let us note that due to Plancherel's identity, results of 
Theorems 5.1-5.3, except for using an extra factor 2tt in the formulas for 
minimax MISEs, hold for the dual problem of characteristic function estima- 
tion. As a result, the EP characteristic function estimator (3.6) is simultane- 
ously sharp minimax over Sobolev and infinitely differentiable distribution 
classes, and it is also rate minimax over classes of distributions with bounded 
spectrum. 

6. Stein estimator. The blockwise-shrinkage literature, devoted to 
Gaussian-shift experiments, also explores a Stein (blockwise-shrinkage) es- 
timator which, using notation of Section 2, can be written as 

(6 .1) §. := Ml ~ (| + l k)Lkn ~ l I{\\y\\l > (1 + t k )L k n^)y 3 , j 6 B k . 

\\y\\k 

Note that if the EP estimator uses a hard block-thresholding, a Stein esti- 
mator uses a soft one. Then, according to the paradigm of Section 3, the 
Stein density estimator can be defined as 

POO 

(6.2) j s { x )~^ Re{h s (u)e- mx }du, 

Jo 

where the Stein characteristic function estimator is 

K 

(6.3) hs(u):=^2ji k h(u)I(u€B k ), u>0 

k=l 
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and, recalling notation (3.3), 

(6-4) fib - ^'^l^^ mMl > (1 + t*) W). 

Ilyllfc 

The following proposition allows one to explore the Stein density (or charac- 
teristic function) estimator via its EP counterpart. Recall that G(L,t,d,d*) 
was defined in (4.8). 

Theorem 6.1. Let f and fs denote EP and Stein estimators which use 
the same blocks, thresholds and K . Suppose that the assumption of Theorem 
4.1 holds. Then 

E (fs(x)-f(x)) 2 dx 

J —oo 

K 
k=l 

x [12L~ 1/2 (1 - (L k + I)- 1 /*)" V /a + dt- k \l + L~ 1/2 )) 

(6.5) 

+ 2t k I(\\e\\ 2 k >(l/2)L k t k n- 1 )} 

K 

+ vr- 1 ™- 1 Y, + t k y l G 2 {L k , t k /2, d, d*) 

k=l 

xi{\\e\\l<iM^)L]!\n- 1 ). 

This result implies that, under the MISE criteria and for the portfolios 
of blocks and thresholds discussed in Section 5, the two estimators perform 
similarly. 

7. Discussion. 

7.1. Parameter K in EP estimator. In the theory of oracle inequalities 
this parameter is assumed to be given; see [8, 21]. For a minimax (or adap- 
tive) setting it should be chosen in such a way that the squared bias of the 
oracle is negligible with respect to its variance. For instance, for a Sobolev 
class with index a this is achieved if h* (u) is zero on frequencies larger than 
7 n n -1 /( 2a+1 ) where 7 n increases to infinity as slowly as desired; this remark 
explains how K := K{n) was chosen in Section 2. At the same time, for 
infinitely differentiable distributions K{n) may be logarithmic, that is, dra- 
matically smaller than for Sobolev functions. Further, for distributions with 
bounded spectrum K(n) may be any increasing-to-infinity sequence. 
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7.2. Sobolev classes with index a < 1/2. This is a new addition to the 
set of distributions covered by the theory of minimax estimation and ora- 
cle inequalities. Obviously there were serious technical difficulties in deal- 
ing with such Sobolev densities. Also, the case of Sobolev densities with 
index larger than 1/2 is very nice and appealing because it implies that 
the characteristic function is absolutely integrable, the corresponding den- 
sity is defined by Fourier inverse formula (3.1) and it is bounded and uni- 
formly continuous. Sobolev characteristic functions with index a < 1/2 do 
not have these nice properties and, moreover, the inverse formula (3.1) is un- 
derstood, according to Plancherel's theorem, as a limit in Li{— oo, oo)-norm 
of (2tt)~ 1 J^ a e~ %ux h(u) du, A — > oo. At the same time, it is important to 
note that the characteristic function is not necessarily absolutely integrable 
and, for instance, there is a vast class of Polya-type characteristic functions 
that are not absolutely integrable. Namely, according to the famous Polya 
condition, a real-valued and continuous function g(u), u £ (—00,00) is the 
characteristic function of an absolutely continuous distribution if g(0) = 1, 
g(—u) = g(u), g(u) is convex for positive u and g(u) — ► as u — > 00; see [40], 
page 70. Note that the condition involves no restriction on how fast g(u) 
must vanish. Characteristic functions h(u) = [1 + |u|^] _1 , 1/2 < (3 < 1 from 
the Linnik distribution family as well as h(u) = [1 + |it| 2 ] _p , 1/4 < p < 1/2, 
studied by Karl Pearson, are particular examples of characteristic functions 
which are square integrable but not absolutely integrable; see [2, 43]. 

7.3. Why MISE? A choice of the loss function in the density estimation 
literature has been always a source of hot debates thanks to statisticians pas- 
sionately devoted to Li-distance, different L p -distances withp > 1, Hellinger 
distances, distances based upon Kullback-Leibler numbers, etc. The inter- 
ested reader can find a discussion of these approaches in [9, 14, 19]. Until 
now, there was no objective argument in favor of the L2-distance/MISE be- 
cause it was always assumed that underlying characteristic functions were 
absolutely integrable. The inclusion of not absolutely integrable character- 
istic functions changes the situation because now Plancherel's theorem (and 
correspondingly L2-norm) is the necessary tool. This remark, at least par- 
tially, may serve as a justification for using the MISE criteria. 

7.4. Distributions with finite support. In many applied problems the 
statistician knows support of the density; circular data is a familiar ex- 
ample. Suppose that the support is [0, 1]. Then, following the Gaussian-shift 
approach of Section 2, the density can be written in Fourier domain as 
fix) = [l + E^i^^M* e [0,1]),^ := Ji fWwix) where := 
2 1 / 2 cos(irjx),j = 1, 2, . . .} is a classical cosine basis on [0, 1]. Further, \\y\\\ := 
Y^j£B k Uji Uj :=n ~ 1 J2?=i l Pj{Xi) may serve as an analogue of ||y||| in the 
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Gaussian-shift and probability settings of Sections 2 and 3 (note that here 
we again intentionally use the same notation). Then the EP finite-support 
density estimator is defined as 



where K and jlf- are the same as in Sections 2 and 3. Corresponding expo- 
nential inequalities and minimax results can be found in the technical report 



7.5. Different types of oracle inequalities. Corollary 4.1 (or Theorem 2.1) 
presents three different types of oracle inequalities, and each may be useful 
on its own. Inequality (2.13) is useful because all its components, apart from 
the oracle's MISE, depend only on blocks and thresholds but not on an esti- 
mated function. This type of oracle inequalities can be found in [6, 7, 8]. The 
other types of inequalities, originated in [18], are more complicated because 
the remainder depends on an estimated function; but this complexity may 
be useful. As an example, let us present a discussion of the, phenomenon, 
mentioned in Remark 2.1, of the necessity for thresholds to vanish for sharp 
mimicking of the oracle's MISE. The nonparametric blockwise-shrinkage lit- 
erature contains results of intensive numerical studies which indicate excel- 
lent performance of estimates with nonvanishing thresholds; see a discussion 
in [6, 7, 10, 12]. Do these studies contradict the theory? To answer this ques- 
tion, let us examine oracle inequalities in Theorem 2.1. Oracle inequality 
(2.13) cannot shed light on the phenomenon because % must vanish for the 
right-hand side of (2.13) to converge to the oracle's MISE. On the other 
hand, oracle inequalities (2.11) and (2.12) can explain the phenomenon. 
Indeed, we can relax the assumption on thresholds by assuming that an 
estimated density (or shift function) satisfies 



It is not difficult to check numerically that this assumption often holds 
for functions used in numerical studies. Thus, oracle inequalities (2.11) and 
(2.12) have allowed us to shed a new light on the above-mentioned numerical 
results. 

Further, let us note that an assumption like (5.2), by including terms de- 
pending on an estimated density, bears the same flavor as the oracle inequal- 
ities (2.11) and (2.12) because it allows the statistician to justify /explain a 
special portfolio of blocks and thresholds for a targeted class of functions. 



K 



(7.1) 



f( x ) -.= i + Y.^Yl vmi x ) T ( x G [°» !]) 



k=i jeB k 



[22]. 




On(l). 
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7.6. Possible applications in related problems. There are many related 
applied problems where the obtained results may motivate new research 
and innovative procedures, with particular examples being survival analysis, 
deconvolution, biased data, error density estimation, time series analysis, 
etc. The developed methodology can be of special interest for the analysis 
of wavelet estimators. A discussion of possible extensions can be found in 
[5, 22, 25, 26, 28, 41, 45]. 

8. Proofs. In what follows 9^ :=-kfc 1 |M|& ~~ n ~ l an d ©fc := L k 1 \\9\\ k . 
Proof of Theorem 4.1. A direct calculation implies that 



(8.1) Eh(u) = h(u) 

Recall that E\\9* - 9\\ 2 k 
(0,1), 



E\h{u) - h(u)\ 2 = 
E lB k \Hkh{u) - h(u)\ 






(8.2) 






Using (8.1) we get 





(8.3) 




rrV* / {l-\h{u)\ 2 )du + (l-ii k ) 2 L k <d k 




r ejn- 1 



+ 



n 1 ji 2 k L k Q k 



Lcefc+n- 1 ) 2 



(Qk + n- 1 ) 2 



n 1 L k n k -n l fx k L k @ k . 
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In particular, this verifies (4.5). The second expectation in the right-hand 
side of (8.2) can be written as 

E {L ~ ^ (n)|2 dv ] = E{ ^ k ~ Vk?L k (®k + n" 1 )} =: E{A}. 

Let us evaluate the term A. Note that (2.3) can be rewritten as £1% = 
©fe(@fc + n_1 )~ 1 ^(@fc > ^fc n_1 )- I n what follows we skip subscripts whenever 
no confusion may occur. Write 

, n- 2 (0-G) 2 I(e>tn- 1 )L 2r/A lw/A 
(6 + n- 1 )(6 + n- 1 ) 2 

Set g := 1 - (L + 1)~ 1/2 and evaluate A ± : 

n- 2 (6-e) 2 /(e>tn- 1 )L rr/ ^ . . i . , . . lxi 
(B + n~ i )(B + n" 1 )^ 

< ; ^" 2 (Q-Q) 2L J/Q _ Q > ^ n -l)J( < (1 _ g) tn -l) 

" (B + n" 1 )(B + n- 1 ) 2 



+ „_i 2 > (1 - g)*"" 1 ) =: A n + A 12 . 



n -l(0_ 0)2 L 

(B + n- 

Plainly An < L(B - 6)7(6 - 6 > qtn~ 1 )I{Q < (1 - g)tn _1 ), and using the 
Cauchy— Schwarz inequality we get 

£{An} < le 1 ' 2 {{q - ef}v?/ 2 {e - 6 > ^n _1 }J(e < (1 - g^n" 1 ). 

To continue we need a result that will be proved in the Appendix. 

Lemma 8.1. Let the assumption of Theorem 4.1 hold. Set d := 
J^oo \h(u)\ 2 du, dj :=max 1)gB f B (\h(u-v)\ j + \h(u + v)\ j ) du, d* :=min^ >0 (.z + 

(a) The moment inequality holds: 

(8.4) E(Q - 6) 2 < L _1 n -1 [2di0 + cfen -1 ]. 

(b) Forg = l-(L + l)- 1 /2 ) 

(8.5) Pr{6 - 6 > gtn _1 }/( < (1 - ^tn" 1 ) < G 2 (L, t, d, d*), 

where G(L,t,d,d*) is defined in (4.8). 

(c) The following relations between d±, d 2 and d hold: 

(8.6) di < [2Ld 2 ] 1/2 and d 2 < d. 
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Using Lemma 8.1 we get 
E{A U } 

(8.7) 

< n- l L\L- x (d + 2^ 2 d 1/2 t)] l ^ 2 G(L,t,d,d*)I(e < (1 - q)^ 1 ). 
Note that (1 - q)~ l = (L + l) 1 / 2 , and then using (8.4) and (8.6) we get 
E{A 12 } < n" 1 L(9 + rT^L^rT^dxQ + d 2 n~ 1 ]I(Q > (1 - <?)in _1 ) 

< n^LlL' 1 ^! + d 2 ))I{@ > (1 - q)tn~ l ) 

< n- 1 f iL[L- 1 (2 3 / 2 L 1 / 2 d 1 / 2 + d[l + (L + I) 1 / 2 *" 1 ])] 

< n-V^[^~ 1/2 (2 3/2 d 1 / 2 + d(L~ x l 2 + (1 + L-V^-i))]. 

Further, 

A 2 = fj, 2 L(@ + n _1 )/(6 < tn _1 )[/(6 > 2tn" 1 ) + 1(9 < 2tn~ 1 )} 
< /i 2 Ln -1 (l + i)/(0 - 9 > 6/2)7(6 > 2tn _1 ) 
+ ii 2 Ln- x i\ + t)J(9 < 2tn~ 1 ) =: A 21 + A 22 . 
Using the Chebyshev inequality and (8.4) we get 

n- 1 L- 1 (2d 1 e + d 2 n- 1 ) 

WW 

< n^fiL^L^i^Ldi)^ 2 + d 2 i -1 )] 

< n"V^[12rf 1/2 L" 1/2 + 2dL- 1 t- 1 }. 

To evaluate A 22 we note that (l + t)fil(@ < 2tn~ l ) < 2tl(® < 2tn _1 ), and 
then 

A 22 < n~V^[min(/_i(l + 1), 2t)I(@ < 2tn~ 1 )}. 
Combining the obtained results we conclude that 

e\\§ - e\\l < e\\§* °" 2 



E{A 21 }<n~ l ixL 



(1 + -1(6 > 2tn~ 1 ) 



A: 

+ n~ l L k ^ k [v k (l - /j, k Q k + (1 + f^" 1 )) 



x [Li ll2 {ihd 1 ' 2 + m(i + l,: 1/2 )(i + ^)) 



(8.8) 

+ min(/i fe (l + t k ),2t k )I(e k < 2t k n~ 1 )}} 
+ rr%(l + ^ 1 )[^fc 1 (d + Sd 1 / 2 ^)] 1 / 2 
x G(L k ,t k ,d,d*)I(@ k < L~ k ll2 t k n- 1 ). 

This verifies (4.4). □ 
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Proof of Theorem 5.1. Relation (5.4) is known for the case of a > 
1/2; see [23, 46, 50]. Consider the case a < 1/2. The lower minimax bound 
(5.4) is established in [22]; the proof is too lengthy to reproduce it here and 
the interested reader is referred to [22]. The upper minimax bound (5.4), as 
well as the validity of upper bound (5.3) for any a, is established with the 
help of the oracle inequality. We begin with the analysis of oracle /* defined 
in (3.10). Following [17], a direct calculation, based on using (4.5), yields 
that whenever L k+ \/L k — > 1, k — ► oo the oracle's MISE satisfies 

sup E ^ (f*(x)-f(x)) 2 dx 
f€S(a,Q) J-oo 

= P(a,Q)n~ 2a ^ 2a+1 \l + o n (l)). 

In other words, the oracle is sharp minimax. 

Then oracle inequality (4.4), together with assumption (5.2), yields 

sup E (f(x) - f{x)f dx(l + o n (l)) 

f&S(a,Q) J-cc 

= sup E (f*(x)-f(x)) 2 dx 

f£S(a,Q) J-oo 

= P(a,Q)n~ 2a ^ 2a+1 Hl + o n (l)). 

This result shows that for Sobolev classes the EP estimator is sharp minimax 
and matches performance of the oracle. □ 

Theorems 5.2 and 5.3 are verified identically. 

Proof of Theorem 6.1. Write 

r (f s (x)-f(x)) 2 dx 

J — oo 

f°° - ~ ^ 

= vr~ 1 / \h s (u) - h(u)\ 2 du = tt' 1 y^ip-k- fj-k) 2 \\y\\l 

J ° k=l 

(8-9) = vr' 1 f) {Lk l% 1)2 1(\\y\\ 2 k > (1 + t k )L k n- 1 ) 

k=i 

K 

< Tr^n" 1 L k t 2 k (l + t k y l I(Q k > ifcn" 1 ) 
k=l 

x [I(e k < (t k /2)n~ l )+I{e k > {t k /2)n~ l )\. 
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Now let us make several preliminary calculations. First of all, set q 
1 - (1 + -L fc )~ 1/2 , and write 

/(e fc >t fc n- 1 )/(e fc <(V2)n~ 1 ) 

< i(e k -e k > {t k /2)n~ l )i{e k < (W2)™- 1 ) 

<I(e k -Q k >q(t k /2)n~ l ) 
x [J(e fc <(l-g)(t Jfc /2)n- 1 ) 

+ /((l - q){t k /2)rC l <9 k < (t fe /2)n- 1 )]. 

Using (8.5) we get 

E{I(& k -e k > (? (t fe /2)n- 1 )}/(e,. < (1 - g)(t fc /2)n~ 1 ) 
<G 2 (L fc ,t fc /2,d,cf')J(e*<(l + £fc)- 1/2 (tfc/2)n- 1 ). 
Using (8.4), (8.6) and a plain inequality I(@ k > bn~ l ) < fj, k (l + b~ x ) we 
E{I(e k -@ k >q(t k /2)n- 1 )} 

x J((l - q)(t k /2)rC x < 6 < (t fe /2)n- 1 ) 

< 4L^ 1 n~ 1 [2(iie fc + dan- 1 ]^^- 2 ]" 1 

x J((l - ? )(t jfc /2)n- 1 < 6 fc < (t fe /2)n- 1 ) 
<4 g - 2 ^ 1 [2R(e fc +n" 1 )n(2I fc d) 1/2 

+ d/i fc (l + 2(1 + L k ) l /% l )]I{Q k < (t k /2)n- 1 ) 

< Aq-Hf^L-^l + t^d 1 / 2 +d{L- 1 ' 2 + 2t k - 1 {l + L- 1 ' 2 ))} 

x I(Q k < (t k /2)n^). 
Combining these results we get 

K 

ir^n- 1 £ L k t 2 k (l + t k y l E{I(Q k > ^n" 1 )}!^ < (i fe /2)n _1 ) 

k=l 

K 

< tt^u- 1 L ktl(l + t k y l G 2 (L k , t k /2, d, d*) 
fc=i 

x/ce^CLfc + i)- 1 / 2 ^)"" 1 ) 

+ TT-^" 1 J2 L k » k [l2L- k 1,2 {l - (L k + l)"V2)-2 
fc=i 

x(d 1 / 2 + ^(l + L- 1 / 2 ))]. 
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Further, we note that I(Q k > (t k /2)n~ 1 ) < // fe (l+2t^" 1 ), and that 

K 

E L k t\{l + t k y l I(Q k > {t k /2)n~ l ) 
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k=l 



K 



< ir^ri- 1 L k n k [2t k I(e k > {t k /2)n- 1 )}. 
k=l 

Combining results verifies (6.5). □ 

APPENDIX 

Proof of Lemma 8.1. The first part of (8.6) is based on the Cauchy- 
Schwarz inequality, the second on the remark that — (u + v) < u — v for any 
u,v € B and |/i(u)| = \h(—u)\ (let us also note that d 2 — ► d as L — > oo). Let 
us now verify (8.4). Write 



E(0 -9) 2 = L~ 2 E 



L~ 2 E 



r „ I 2 
I (\h{u)\ 2 -\h{u)\ 2 -n- l )du 

B 

(\h( u )?-\h(u)\ 2 -n- 1 ) 



B JB 



(A.l) 



L 



B JB 



x (\h(v)\ 2 -\h(v)\ 2 -n-^dudv 
E{\h{u)\ 2 \h{v)\ 2 } dudv 



2L~ 2 / E{\h(u)\ 2 }du / (\h(v)\ 2 + n' 1 ) dv 

JB JB 

2 



+ L- 



B 



(\h{u)\ 2 + n~ l )du 



--:A 1 + A 2 + A ?I . 



Consider these three addends in turn. In what follows l k ^ l m ^ ■ ■ ■ ^ l q 
means that all these parameters are different, and recall the assumption 
n > 3. Write 



n*E{\h(u)\ 2 \h(v)\ 2 } 



= E E{exp(iuX h - iuXi 2 + ivXi 3 - ivX^)} 

h,h,l3:li = l 

< jr \h( u )\ 2 \h(v)\ 2 

n n 

+ E IM^)I 2 + E l*(«)l a 
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n 



+ 2 £ \h(u)\\h(v)\(\h(u-v)\ + \h(u + v)\) 



II 



+ 2 J2 !^)l 2 



n n 



+ 2 y, W«)l 2 + E (|%-«)| 2 + |%+«)| 2 ) 



il=i2=i37^i4=l /l=Z4^i2=^3=l 



n n 



+ E i+ E i- 



Then, using 2|/i(u)||/i(u)| < |/i(u)| 2 + |/i(v)| 2 , |/t(«)| < 1, (n - l)(n - 2)(n - 
3) = n 3 — 6n 2 + lln — 6, (n — l)(n — 2) = n 2 — 3n + 2 and simple algebra, we 
get 

< 6 2 [1 - 6n _1 + ll?i" 2 - 6n -3 ] + 2n~ 1 9[l - Sn^ 1 + 2n -2 ] 
+ 2L- 1 n- 1 di9 + 4n" 2 6[l - n" 1 ] + n~ 2 L- 1 d 2 + n~ 2 

= 6 2 - 2n" 1 G 2 + Q 2 [-An~ 1 + lln~ 12 - 6n~ 3 ] 

+ 2n~ 1 6 - 2n~ 2 6 + L -1 n -1 [2di8 + ^n" 1 ] + n~ 2 . 
Further, (8.1) implies £|/i(u)| 2 = \h{u)\ 2 + n~ l {\ - \h(u)\ 2 ), and we get 

A 2 = -2L~ 2 j (\h(u)\ 2 + n-\l- \h{u)\ 2 ))du f {\h{v)\ 2 + n~ l ) dv 
Jb Jb 

= -2[9 + n~ l {l - 0)][6 + n" 1 ] 

= -29 2 - 4n _1 e - 2?i" 2 + 2?i- 1 9 2 + 2n~ 2 0. 

Also A 3 = (n~ l + 0) 2 = n~ 2 + 2n~ 1 9 + @ 2 . Combining the results in (A.l) 
and using — 4n _1 + lln -2 — 6n -3 < for n > 1 we verify (8.4). 
Let us check (8.5). Write 



LQ = / \h(u)\ 2 du — Ln 1 = n 2 / exp{iu(X; — X m )} ciu — Lra 

\<l,m<n B 



= n~ 2 2 / cos(u(X ; -X m ))(iu=:n- 2 2 £ g(Xi - X m ). 



Note that g(x,y) := g(x — y) is a symmetric function in (x,y) which can be 
viewed as a kernel of [/-statistics. Thus we can use known exponential in- 
equalities for U -statistics to analyze G . In what follows X, X\ , . . . , X n ,Y,Y±,..., 



n 




^ / exp{iu(Xi -X m )}du 

l<l^m<n B 
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Y n are i.i.d. random variables according to an underlying density /. Using 
Hoeffding's decomposition we continue: 

L9 = 2n" 2 £ H{X h X m ) 

l<Km<n 



(A.2) 



+ 2(n - l)n- 2 J2(E{g(X l - Y)\X t } - E{g(X - Y)}) 



1=1 



+ (n - l)n^E{g(X - Y)} =: A 1 + A 2 + A 3 



where H(X, Y) := g(X — Y) — E{g(X - Y) \X} - E{g(X -Y)\Y} + E{g(X 
Y)}. A direct calculation shows that 

(A.3) E{g(X -Y)} = R e y Ee iu ^ x ~ Y ^ duj = J \h{u)\ 2 du = LQ 



and 



(A.4) E{g(X - Y) \Y} = Re j jf e» y / i (- 

This implies 

H(X,Y)= [ cos(u(X -Y))du 
Jb 



-u) du 



(A.5) 



Re 



e iuX + e iuY )h{-u)du \ +LO. 



B 



According to Theorem 1 in [13], for all z > there exists a universal 
constant c\ such that 

(A.6) Pr{|ii| >z}< ciPr{|ij| > z/a}, 

where A\ := n- 2 J2i<i^ m < n H ( x i, Y m) = ri~ 2 Y,i<i, m <n H ( X h Y m) ~ 
n~ 2 J2f=iH(Xi,Yi) is a decoupled version of A\. Using (A.2)-(A.6) we write 
for any g,7G (0,1), 

Pr{6 - 9 > qtn- 1 } 

= Pr{ii + A 2 + [(n - l)/n]LG - LG > qtLn" 1 } 

< Pr{ii + A 2 > qtLn' 1 } < Pr{A 1 > jqiLn' 1 } 
+ Pr{i 2 > (1 - t)^"" 1 } 

< ciPr{|I^| > ^qtLn~ l /ci} + Pr{i 2 > (1 - l)qtLn~ 1 } 



(A.7) 



< ciPr< n 



Kl,m<n 



>^ 2 qtLn fc\ 
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ciPr< n 



I2 H ( x i> Y i) 



1=1 



>7(1 — j)qtLn 1 /c\ 



+ Pr{i 2 > (l-^qtLn" 1 }. 

Consider the first probability. H(x,y) is symmetric in (x,y) and it is a 
completely degenerated kernel in the sense that E{H (X, Y)\X} = 0. Thus 
we can use the following exponential inequality (3.18) of [27]. 

Lemma A.l. Let X, X\, . . . ,X n ,Y,Yi, . . . ,Y n be i.i.d. Consider a sym- 
metric and completely degenerated kernel H(x,y). Then there exists a uni- 
versal constant c 2 such that for any z > 



Pr< 



(A. 



£ H{X h Y m ) 

l<l,m<n 

f 1 

< C2 exp < mm 

I c 2 



>z 



n 2 E{H 2 (X,Y)Y n\\H\\* ' 

7 2/3 



,1/2 



\\E{H*(X,Y)\X}\UV*' m fr< 



where 



\\H\U := sup {E{H(X,Y)i> 1 (X)MY)}--E{4> 2 1 (X)} < l,E{4, 2 (Y)} < i}, 
\\E{H 2 (X^XjW^ := 8 wp x E{H 2 (x,Y)} and \\H\\ OB :=siip xv H(x i y). 



Let us evaluate in turn the four components of the minimum in (A.8). 
Using (A. 5) and (a + b + c) 2 < 2a 2 + 4(6 2 + c 2 ) we get 

E{H 2 (X,Y)} 



<E 
+ AE 



B JB 



(cos[(u -v){X- Y)\ + cos[(u + v)(X- Y)]) dudv 



{e mX + e iuY )h{-u)du 



+ 4(L6) 5 



' B JB 

+ AE 



(\h{u-v)\ 2 + \h(u + v)\ 2 )dudv 



B JB 



(e mA +e mr )(e + e )h{-u)h{v) dudv + 4(L9)^ 
[h(u -v) + h{u)h{-v)]h{-u)h{v) dudv + 4(L9) 2 



< Ld 2 + 8 

JB JB' 

< Ld 2 + 8[d 2 L 3 ] 1/2 9 + 12(L9) 2 . 
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In the last inequality we used J B \ h(u — v) \ du < [Lc^] 1 / 2 . Recall that we are 
considering only < (1 — q)tn~ l , q € (0, 1), and get 

(A.9) E{H 2 (X,Y)} < L[d 2 + 4LV 2 (1 - q)t n - 1 (2d 1 2 /2 + 3L 1/2 (1 - gJirT 1 )]. 

Now we are considering ||-ff||*. In what follows the supremum is taken 
over ipi and ip2 such that Etp 2 (X) < 1, j = 1, 2. Write 

||#||* = su P £{ff(x,r)</> 1 (x)^ 2 (r)} 

< supE{ ff (X - y)Vi(X)^a(y)} + 2sup£{£{<?(X - Y)\X}^{X)} + L0 
= :Di+2D 2 +Le. 
Introduce ^4 := {x : /(x) < z}, 7 G (0, 1), and write 

L»i = sup^jy (l/2)[e i "( x - y ) + e-™( x - y )]V'i(X)^2(y)du| 
= sup/ ^{e^ViW}! 2 ^ 

JB 

< (1+7) sup / \E{I(X eA)e iuX *l>i(X)}\ 2 du 

JB 

+ (l + 7 - 1 )sup / \E{I(X eA c )MX)}\ 2 du 

JB 

= :D n + D 12 . 
Using the Plancherel identity we get 

D u < (1 + 7)(2vr) sup / f 2 (xU 2 (x) dx < (1 + -y)2nz. 
J A 

Further, using the Cauchy-Schwarz inequality we get 

Di2<{l+l~ l )L f(x)dxsup f(x)ip 2 (x)dx 

JA C J-oc 

< (l+ 7 ^ 1 )Lz~ 1 / f 2 {x)dx. 

JA C 

Set 7 = 0.2 and get D 1 < 8d* . 

Further, D 2 = sup-E{Re{/ B e iuX h{-u) du}^i{X)} < f B \h(u)\du < 2LQ 1 / 2 . 
Plainly < min((l — (/)in _1 , 1), and this yields that 

ll-ff II* < 8d* + 2L0 1 / 2 + LQ < 8d* + 3L(1 - qf^t 1 / 2 ^ 1 / 2 . 

Further, let us consider \\E{H 2 (X, F)|X}||oo. Using (a + b + c + d) 2 < 
2a 2 + 46 2 + 8(c 2 + d 2 ) we get 

E{H 2 (x, Y)} < 2E{g 2 (x - Y)} + AE 2 {g(x - Y)} 
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+ 8E{E 2 {g{X - Y)\Y}} + 8L 2 G 2 



2E 



+ 8E 



B 



cos(u(x — Y)) du 



Re<{ I e iuY h{-u)du 



B 



+ 4 

1 2 



Rejy e iux h{-u)du 



+ 8Z/9 



2c2 



<{l/2)E / ( e iu(x ~ Y) + e - iu{x ~ Y) ) 



BJB 



( e iv(x-Y) + e -iv(x-Y)^ dudv + 20L 2 Q 



< / / (\h(u-v)\ + \h(u + v)\)dudv + 20L 2 Q 
Jb Jb 

< L[d 1 + 2Q{l-q)tLn~ 1 ]. 

Finally, we have a plain inequality sup X J/ \H(x,y)\ < 4L. Using these results 
and Lemma A.l we get 

(A.10) Pr 



£ H(X u Y m ) 

l<Lm<n 



> j 2 qtnL jc\ | < C2 exp j ■ 



dc 2 c 2 



where 



v\ : = mm 



(A.ll) 



[1 + 4LV2(i _ g )tn- 1 (2d- 1 /2 + M^L 1 / 2 ^ - q)tn~ 1 )} ' 
cid d[efnr 4 L- 2 ] 1/3 



t[8d* + 3L((1 - g )t/n)V2] ' [d a + 20(1 - q)Ltn~ 1 } 1 / 3 ' 

cf dn^ \ 
2t 3 / 2 L )' 

To evaluate the second probability in (A. 7), let us recall Bernstein's in- 
equality. 

Lemma A. 2. Let Zi,...,Z n be Ltd., \Z\\ < M a.e., E{Z{\ = and 
Var(Zi) = a 2 < oo. Then for any z > 



(A.12) 



This implies 



max (^ Pr |E Z i <-z^Vt\Y J Z l >z 



< exp 



2nd 2 + (2/3)Mz J' 



Pr 



2=1 



> 7(1 — j)qtnL/ci 
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7 2 (l- 7 )Vt 2 L 



< 2exp 

I n^Ltc\\ic{ 1 + 2dL~H- 1 + 8n~ 1 (2d 1 / 2 +t)) J ' 
Let us consider the third probability in (A. 7). Write 

Plainly |^| < / B |/t(u)| du + L9 < 2L9 1 / 2 . Also, E{Vi} = and 

Var(Vi) < £ / / (l/4)[e iuX / i (-n) +e- iuX /i(n)] 

x [e™ x /i(-?;) +e-™ x /l(^;)](2u(fo 
< (l/2)diL9. 

Then Lemma A. 2, n > 3 and G < (1 — q)tn~ l imply that 
Pr{I 2 > (1 -T^Ln" 1 } 

2" 2 (l- 7 ) 2 g 2 t 2 L 2 [n/(n-l)] 2 



< 



< 



cxp 



cxp 



ndxLO + (2/3)2L0 1 /22-i(l - -y)qtL[n/(n - 1)] 
(l- 7 ) 2 g 2 t 2 L 



4[(1 - 9 )dit + (1 - g )V2i3/2 Ln -i/2] J • 



Combining the obtained inequalities in (A. 7) implies [z^i is defined in 
(A.ll)] 

Pr{9 - 9 > qtn- l }I{® < (1 - q)trT x ) 
< cic 2 exp{-7 2 g 2 t 2 Li/i/(dc 2 c 2 )} 

(A.13) 

7 2 (l- 7 )Vi 2 L 



+ 2ci exp 



n^Lfc 2 [3ci 1 + 2dL-H- 1 + 8n- 1 (2d 1 /2 + t )\ 
f (1~7)V^ 



+ 6XP I 4[(1 - q)d l t + (1 - g ) 1 /2 t 3/2 Lra -i/2] 

Set g = 1 — (L + 1)~ 1/2 , 7 = 1- min(l/2,t 1 /4) ) anc } t hi Sj together with 
(8.6), verifies (8.5). □ 
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